DATA PROJECT

Angelica Gamboa and Gwendolyn Espinoza

2025-03-28

INTRODUCTION

QUESTIONS

  1. What are the leading causes of death since 2001?
  2. Which state had the most amount of deaths in the US?
  3. What was the leading cause of death in California?
  4. How has the cause of heart disease deaths changed over time?
  5. How has the cause of cancer deaths changed over time?
  6. We know the adjusted death rate for the data, but what does the rate look like when we subtract “all causes’?
  7. What is the death rate by cause and state?
  8. What is the age-adjusted death rate for heart disease?

Unfiltered Data

## Warning: package 'ggplot2' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 10,868
## Columns: 6
## $ Year                    <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
## $ X113.Cause.Name         <chr> "Accidents (unintentional injuries) (V01-X59,Y…
## $ Cause.Name              <chr> "Unintentional injuries", "Unintentional injur…
## $ State                   <chr> "United States", "Alabama", "Alaska", "Arizona…
## $ Deaths                  <int> 169936, 2703, 436, 4184, 1625, 13840, 3037, 20…
## $ Age.adjusted.Death.Rate <dbl> 49.4, 53.8, 63.7, 56.2, 51.8, 33.2, 53.6, 53.2…

Variables in the dataset:

Leading Causes of Death Since 2001

CLRD stands for Chronic Lower Respiratory Diseases.

Total Deaths in Each State

Leading Cause of Deaths in California

How has the number of heart disease deaths changed over time?

## # A tibble: 19 × 2
##     Year Total_Heart_Deaths
##    <int>              <int>
##  1  1999            1450384
##  2  2000            1421520
##  3  2001            1400284
##  4  2002            1393894
##  5  2003            1370178
##  6  2004            1304972
##  7  2005            1304182
##  8  2006            1263272
##  9  2007            1232134
## 10  2008            1233656
## 11  2009            1198826
## 12  2010            1195378
## 13  2011            1193154
## 14  2012            1199422
## 15  2013            1222210
## 16  2014            1228696
## 17  2015            1267684
## 18  2016            1270520
## 19  2017            1294914

How has the number of cancer deaths changed over time?

## # A tibble: 19 × 2
##     Year Total_Deaths
##    <int>        <int>
##  1  1999      1099676
##  2  2000      1106182
##  3  2001      1107536
##  4  2002      1114542
##  5  2003      1113804
##  6  2004      1107776
##  7  2005      1118624
##  8  2006      1119776
##  9  2007      1125750
## 10  2008      1130938
## 11  2009      1135256
## 12  2010      1149486
## 13  2011      1153382
## 14  2012      1165246
## 15  2013      1169762
## 16  2014      1183400
## 17  2015      1191860
## 18  2016      1196076
## 19  2017      1198216

We know the adjusted death rate for the data, but what does the rate look like when we subtract “all causes’?

## Warning: package 'patchwork' was built under R version 4.4.3

What is age adjusted death rate?

What is the death rate by cause and state?

## Warning: package 'plotly' was built under R version 4.4.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.